On the Impact of Random Index-Partitioning on Index Compression
نویسندگان
چکیده
The performance of processing search queries depends heavily on the stored index size. Accordingly, considerable research efforts have been devoted to the development of efficient compression techniques for inverted indexes. Roughly, index compression relies on two factors: the ordering of the indexed documents, which strives to position similar documents in proximity, and the encoding of the inverted lists that result from the ordered stream of documents. Large commercial search engines index tens of billions of pages of the ever growing Web. The sheer size of their indexes dictates the distribution of documents among thousands of servers in a scheme called local index-partitioning, such that each server indexes only several millions pages. Due to engineering and runtime performance considerations, random distribution of documents to servers is common. However, random index-partitioning among many servers adversely impacts the resulting index sizes, as it decreases the effectiveness of document ordering schemes. We study the impact of random index-partitioning on document ordering schemes. We show that index-partitioning decreases the aggregated size of the inverted lists logarithmically with the number of servers, when documents within each server are randomly reordered. On the other hand, the aggregated partitioned index size increases logarithmically with the number of servers, when state-of-the-art document ordering schemes, such as lexical URL sorting and clustering with TSP, are applied. Finally, we justify the common practice of randomly distributing documents to servers, as we qualitatively show that despite its ill-effects on the ensuing compression, it decreases key factors in distributed query evaluation time by an order of magnitude as compared with partitioning techniques that compress better.
منابع مشابه
Impact of gasoline contamination on mechanical behavior of sandy clay soil
Oil leakage causes soil contamination and induces changes in the physical and mechanical properties of soils. In areas contaminated with oil products such as gasoline, the implementation of civilian operations requires determination and prediction of soil behavior in the existing conditions. In this research work, the effect of oil contamination by gasoline obtained from the National Oil Compan...
متن کاملIMPACT OF WOMEN EMPOWERMENT ON FOOD SECURITY AMONG RURAL HOUSEHOLDS IN KWARA STATE, NIGERIA
The study investigated the impact of women empowerment on food security in Kwara State, Nigeria. Specifically, the study assessed the levels of food security and women empowerment in the study area as well as examined the relationship between both. Also, constraints on women empowerment in the study area were identified. An interview schedule was used to elicit data from 150 rural households ac...
متن کاملImpact of Health Research Systems on Under-5 Mortality Rate: A Trend Analysis
Background Between 1990 and 2015, under-5 mortality rate (U5MR) declined by 53%, from an estimated rate of 91 deaths per 1000 live births to 43, globally. The aim of this study was to determine the share of health research systems in this decrease alongside other influential factors. Methods We used random effect regression models including the ‘random intercept’ and ‘random intercept and ran...
متن کاملPhotoperiod and growing degree days effect on dry matter partitioning in Jerusalem artichoke
The effect of photoperiod and growing degree days (GDD) on dry matter and dry matter partitioning in Jerusalem artichoke was investigated during 2008-09 and 2009-10. Three Jerusalem artichoke genotypes (CN-52867, JA-89 and HEL-65) were planted in 15 day-intervals between with thirteen different dates (September 20 to March 20) atKhon Kaen University,Thailand. Jerusalem artichoke genotypes ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1107.5661 شماره
صفحات -
تاریخ انتشار 2011